feat(async): whole-app tokio runtime — full async migration (#813)#823
Conversation
#813 Phase A) Phase A of the full-async runtime conversion (#813). Converts the shared subprocess runner and response-file writer in `fbuild-core` from sync `std::process::Command` / `std::fs` to async `tokio::process::Command` / `tokio::fs`, while preserving the existing Job Object + `PR_SET_PDEATHSIG` containment story. * `crates/fbuild-core/src/subprocess.rs` - `run_command`, `run_command_with_stdin`, `run_command_passthrough` are now `async fn` returning `Result<...>` instead of blocking `fn`s. - Internals: tokio `Command` routed through `containment::tokio_spawn::spawn_contained` so daemon-spawned children still inherit Job-Object kill-on-close (Windows) / per-child pgid + `PR_SET_PDEATHSIG` (Linux). - Stdin/stdout/stderr drain concurrently via the tokio reactor, same Windows pipe-buffer deadlock fix as before (#141). - Timeouts implemented via `tokio::time::timeout`; on expiry the child is killed before returning `FbuildError::Timeout`. - Added sync escape hatches `run_command_blocking`, `run_command_with_stdin_blocking`, `run_command_passthrough_blocking` for diagnostic CLI subcommands and tests that aren't on the tokio runtime. - Preserved `ToolOutput` shape byte-for-byte (lossy-UTF-8 lines joined by `\n` with trailing newline) so downstream parsers keep working without changes. * `crates/fbuild-core/src/response_file.rs` - `write_response_file` is now `async fn`, using `tokio::fs` for directory creation, existence probing, and writes. - `write_response_file_blocking` added for sync callers; both paths share `render_response_file` so the on-disk bytes are identical. * `crates/fbuild-core/src/containment.rs` - Doc comment updated to call out that `subprocess::run_command` is now async and routes through `tokio_spawn::spawn_contained`. - No behavioural changes to the containment primitives themselves. * `crates/fbuild-core/Cargo.toml` - Added `async-trait` dep so downstream crates can pull it transitively without each one redeclaring the workspace dep. * `crates/fbuild-core/src/install_status.rs` - Intentionally left sync: the subscriber is a fire-and-forget status broadcast, not a heavy op. Switching it to async would pin the subscriber to a tokio runtime for zero behavioural benefit. Workspace `Cargo.toml` already had `async-trait = "0.1"` in `[workspace.dependencies]`, so no root-level changes needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…821 / #813 Phase A) Agent B scope of the #813 full-async migration. Every HTTP call, package install, and library compile in fbuild-packages now runs on the daemon's tokio reactor instead of a per-call standalone runtime or a blocking reqwest::blocking::get. Highlights: - `src/http.rs`: shared `reqwest::Client` with 5-min request timeout and 30-sec connect timeout. Centralizes timeout policy (audit per #805) so no fbuild-packages call site builds its own client. - `Package` trait → `#[async_trait] async fn ensure_installed`. All 32 impls (toolchain/* + library/*) converted. Per-impl bridges that built a fresh `tokio::runtime::Runtime` per install are gone; everything just `.await`s `staged_install`. `Send` bound added to `staged_install`'s `validate` closure so async-trait Futures stay Send. - `downloader.rs`, `library/arduino_api.rs`, `library/registry.rs`, `toolchain/clang.rs`, `toolchain/avr.rs` (test): every `reqwest::get` / `reqwest::blocking::get` rewired through `http::client()`. Last `reqwest::blocking::*` call removed from the crate. - `library/library_compiler.rs`: `compile_library_with_jobs` is async. Parallel build path replaced `std::thread::scope` + `Mutex<Iter>` with `tokio::task::JoinSet` + `tokio::sync::Semaphore` bounding concurrency to `jobs`. `compile_one_source` / `archive_objects` now `.await run_command` (already async in fbuild-core). - `library/library_manager.rs`: download pipeline uses `JoinSet` instead of `Vec<JoinHandle>`. `ensure_libraries_sync` kept as a legacy bridge (block_in_place when on a runtime, fresh runtime otherwise) for fbuild-build call sites not yet converted in this PR slice. - `library/esp32_framework/libs.rs`: `ensure_libs` / `ensure_mcu_libs` are async. Cross-crate concern: orchestrator callers in fbuild-build/src/esp32/orchestrator/packages.rs need `.await` (Agent C scope). - `disk_cache/budget.rs`: disk-space probe uses `run_command_blocking` (sync helper) since `compute()` is called from sync GC startup paths. - `lnk/resolver::resolve` and `toolchain/esp32_metadata::resolve_toolchain_url_sync` keep their sync signatures (consumers in fbuild-build / fbuild-cli not in this slice) but the inline runtime-creation logic is now scoped to those two bridges instead of the workspace-wide `block_on_package_future` helper, which is deleted. Cross-crate concerns surfaced (these break fbuild-build / fbuild-daemon until coordinating agents land their changes): - `Package::ensure_installed` is now async — every call site in fbuild-build/, fbuild-daemon/, and fbuild-build/tests/ needs `.await`. - `Esp32Framework::ensure_libs` / `ensure_mcu_libs` are async — esp32 orchestrator `packages.rs` needs `.await`. - `EspQemu::resolve_executable` is now async — emulator handlers need `.await`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Phase B)
Convert Agent D's slice of fbuild#813 to async/await:
* fbuild-deploy `Deployer` trait now async via `#[async_trait]`.
`deploy()` and `post_deploy_recovery()` are both async; the default
`post_deploy_recovery` polls via `tokio::time::sleep` and offloads
the blocking `serialport::open()` probe to `spawn_blocking`.
* Per-platform Deployer impls converted:
- `AvrDeployer` — `run_command().await` (avrdude)
- `LpcDeployer` — `run_command().await` (lpc21isp)
- `TeensyDeployer` — `flash::run_with_retry().await`; the
sync serialport-touching helpers (`soft_reboot::baud_134_trigger`,
`halfkay_probe::wait_for_disappearance`, `port_discovery::
wait_for_new_cdc_port`, `first_byte_probe::probe`) are individually
offloaded to `tokio::task::spawn_blocking` inside `deploy()`. The
retry-loop sleep is now `tokio::time::sleep`.
- `Esp32Deployer` — `run_command().await` for the esptool path;
`try_verify_deployment`, `deploy_regions`, `try_deploy_native`,
`try_deploy_regions_native`, `try_verify_deployment_native`,
and `try_verify_deployment_esptool` are async.
* ESP32 native espflash path: espflash's `Flasher`/`Connection` types
are sync (sync serialport under the hood), so each entry point
(`try_write_deployment_native` / `try_verify_deployment_native`)
is wrapped in `tokio::task::spawn_blocking` from inside the
`Esp32Deployer` async methods. Public espflash API stays sync.
* `native_write_or_fallback` helper refactored from "takes closure
returning sync Result" to "takes Result outcome" so async call
sites can `.await` the native attempt before deciding to fall back.
* Test bodies: `#[test]` -> `#[tokio::test]` + `async fn` for every
test that calls a now-async deployer method. Hardware-gated
per-chip ESP32 verify tests follow the same pattern.
Daemon handler bodies:
* `handlers/operations/build.rs` non-streaming path drops the
`spawn_blocking(|| orchestrator.build(¶ms))` wrapper now that
`BuildOrchestrator::build` is async (Agent C). The streaming path
keeps `tokio::spawn` around the build future so the per-10s
status-heartbeat loop can still drive timeout-based polling.
* `handlers/operations/deploy.rs` collapses both `spawn_blocking`
wrappers: the build call becomes a direct `.await`, and the large
per-platform Deployer dispatch closure becomes an `async {}` block
that ends in `.await`. The `Result<Result<_, FbuildError>, JoinError>`
match shape collapses to `Result<_, FbuildError>` — the "deploy task
panicked" arm is unreachable now and was removed.
* Post-deploy recovery: `deployer.post_deploy_recovery(port).await`
is called directly on the runtime (no more `spawn_blocking`).
Not touched (out of scope / owned by other agents):
* fbuild-build/* (Agent C)
* fbuild-packages/* (Agent B)
* fbuild-core/subprocess (Agent A — assumed to make `run_command`
async; this commit's `.await` calls are written against that
signature)
* fbuild-daemon/src/handlers/emulator/select.rs — calls
`orchestrator.build` and must be updated when Agent C lands
* fbuild-daemon/src/handlers/operations/{install_deps,reset}.rs
— Agent B/C-owned wrappers around `install_platform_deps` and
the sync `fbuild_deploy::reset::reset_device`
* fbuild-cli/* — already async (uses async reqwest::Client); the
`port_scan` diagnostic still uses `reqwest::blocking` on a
dedicated OS thread, intentional per its module comment
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#813 Phase B) Convert the entire `fbuild-build` crate to fully-async per #820 (Phase B of the #813 async runtime conversion). ## Trait conversions - `BuildOrchestrator::build` now `async fn` (via `#[async_trait]`). - `Compiler::compile_one` + default methods (`compile_c`, `compile_cpp`, `compile`) now `async fn`. - `Linker::archive`, `link`, `convert_firmware`, `report_size`, `link_all` now `async fn`. - `PlatformSupport::install_deps` now `async fn`. - `SketchBuilder::build` (compile_many) now `async fn`. ## Subsystem conversions - `compiler::compile_source` → `async fn`. Replaces the `Handle::block_on(svc.compile_blocking(...))` bridge with a direct `svc.compile(...).await` against the new `FbuildZccacheService::compile` async dispatch (the legacy `compile_blocking` shim is kept `#[deprecated]` for transitional use). - `parallel::compile_sources_parallel`: replaced `std::thread::scope` work-stealing with `tokio::task::JoinSet` + `tokio::sync::Semaphore`, preserving the per-file rebuild-signature semantics, the warning collection, and the early-abort-on-error contract. - `compile_many::run_stage2`: replaced `std::thread::scope` pool with `JoinSet` + semaphore (each per-sketch task `.await`s `SketchBuilder::build`). - `pipeline::compile_sources`, `compile_local_libraries`, `compile_project_as_library`, `compile_extra_libraries`, `run_sequential_build_with_libs`, `log_toolchain_version`, `BuildContext::new`/`new_with_perf` all converted to `async fn`. - `linker::LinkerBase::archive`, `report_size`, `analyze_symbols`, `objcopy_firmware` converted to `async fn`; per-platform Linker impls updated to `.await` them. - `symbol_analyzer::analyze_elf` + `run_objdump_and_attribute`, `script_runtime::resolve_extra_script_overlay`, ESP32 orchestrator submodules (`packages.rs`, `framework_libs.rs`, `local_libs.rs`, `boot_artifacts.rs`, `embed.rs`, `embed_stage.rs`) all converted. - `shrink::probe::ExternalPreprocessor::preprocess` keeps its sync trait signature and bridges to the async `run_command_with_stdin` via `tokio::task::block_in_place` + `Handle::block_on` (cold-path, one call per build). - `compiler::compiler_version`: same `block_in_place` bridge for the sync `rebuild_signature` trait method. ## Per-platform impls converted 13 `Compiler` impls (AVR, ESP32, ESP8266, Teensy, ARM, NRF52, CH32V, SAM, Silabs, Renesas, framework_core_cache test), 11 `Linker` impls (AVR, Teensy, ESP32, ESP8266, CH32V, ARM, NRF52, SAM, Silabs, Renesas), 13 `BuildOrchestrator` impls (AVR, Teensy, ESP32, ESP8266, RP2040, STM32 + arduino_mbed, NRF52, NxpLpc, Apollo3, CH32V, SAM, Silabs, Renesas), 13 `PlatformSupport::install_deps` impls. ## Cross-crate dependencies noticed - Assumes Agent A converted `fbuild_core::subprocess::run_command` + `run_command_with_stdin` to `async fn`. - Assumes Agent B converted `fbuild_packages::Package::ensure_installed` to `async fn` via `#[async_trait]` and `fbuild_packages::library::library_compiler::compile_library_with_jobs` to `async fn`. - `fbuild-cli` callers of `symbol_analyzer::analyze_elf` will need to `.await` it.
Sweep cross-crate call sites that still used the old sync signatures after the four parallel async-migration commits (687c5ac, f4e9445, 3cd2448, 177a13c). All call sites either `.await` the new async fn or use the `_blocking` bridge for genuinely-sync test contexts. Changes by crate: CLI (fbuild-cli): - pio.rs: convert `find_pio`, `run_pio_command`, `pio_build`, `pio_deploy`, `pio_monitor` to async (run_command is now async) - daemon_cmd.rs: convert `kill_process`, `find_daemon_pids`, `run_daemon_kill_all` to async (called from async daemon dispatch) - build.rs: convert `open_in_browser` to async - symbols_cmd.rs / graph_cmd.rs / bloat_lookup.rs: `analyze_elf` is now async, propagate `.await` through `run_symbols`, `run_bloat_graph`, `run_bloat_lookup` and update dispatch.rs callers - compile_many.rs: drop the obsolete spawn_blocking wrapper; await the now-async `compile_many` directly Daemon (fbuild-daemon): - operations/install_deps.rs: await async `install_platform_deps`; drop spawn_blocking wrapper - emulator/runners.rs: convert `find_simavr` to async; refactor addr2line resolution to await the toolchain probe - emulator/avr8js_npm.rs: convert `find_node`, `ensure_avr8js_npm`, `ensure_avr8js_npm_in` to async (run_command call) - emulator/avr8js_deploy.rs: await the above - emulator/shared.rs: convert `resolve_esp32_toolchain_gcc_path` to async (Package::ensure_installed is async); await crash decoder - emulator/qemu_deploy.rs: refactor addr2line resolution to await - emulator/select.rs: drop spawn_blocking wrapper around orchestrator.build (now async); flatten Result wrapping - handlers/websockets.rs: await new async process_crash_line Serial (fbuild-serial): - crash_decoder.rs: convert `CrashDecoder::process_line`, `decode`, and `run_addr2line` to async; tests gain #[tokio::test] - manager.rs: convert `process_crash_line` to async; remove-then- reinsert the decoder so the DashMap shard lock isn't held across the addr2line `.await` Build (fbuild-build): - compiler.rs: fix sync wrapper for write_response_file that was silently returning a Future instead of Result - {teensy,arm,esp32}_linker.rs: await response_file::write_response_file - tests/*: convert integration tests to #[tokio::test(flavor = "multi_thread", worker_threads = 4)] and await orchestrator.build, compile_many, compile_many_with, Package::ensure_installed, compile_source - compile_many_two_stage.rs: MockBuilder gets #[async_trait] impl; swap std::thread::Barrier for tokio::sync::Barrier; use tokio::time::timeout instead of std::thread::sleep deadline loop - compiler_tests.rs: use write_response_file_blocking for sync tests - esp32_compiler.rs: convert test_response_file_generation to async - Cargo.toml: add async-trait + tokio (macros, rt-multi-thread, sync) to dev-dependencies for integration tests Packages (fbuild-packages): - library_compiler.rs: await write_response_file in compile_one_source and build_include_flags (latter becomes async) - 24 toolchain/library test modules: drop unused `use crate::Package` imports that triggered #[deny(unused_imports)] under -D warnings Deploy (fbuild-deploy): - esp32/tests.rs: convert test_deploy_requires_port to #[tokio::test] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…9 from adc3013 over-correction)
Address the 36-error fix-up sweep where sync functions were still calling now-async APIs after the #813 async runtime migration. Changes by file: * `symbol_analyzer/mod.rs` — `demangle_batch` now async (calls async `run_command_with_stdin`); both call sites in `analyze_elf` await it and switch from `.unwrap_or_else` to explicit match (closures can't hold the awaited result cleanly). * `script_runtime_tests.rs` — added `.await` to every `find_python()` guard (now async) and to the `resolve_extra_script_overlay` call in `resolve_runtime_overlay`. Promoted `resolve_runtime_overlay` from `fn` to `async fn`. Converted four remaining `#[test]` fns that touch the runtime to `#[tokio::test] async fn` and awaited the helper. * `pipeline/library.rs` — converted all 7 `#[test]` fns in the inner test module that exercise the now-async `compile_project_as_library` to `#[tokio::test] async fn`, with `.await` on each call. * `compile_many.rs` — converted three `#[test]` fns that call the async `compile_many` to `#[tokio::test] async fn`. * `rp2040/orchestrator.rs` — `compile_boot2_object` now async (calls async `Compiler::compile_one`); awaited at the single caller in `build`. * `sam/orchestrator.rs` — `install_sam_core` and `install_samd_core` now async (both call async `Package::ensure_installed`). Awaited at the two call sites in `build`. * `esp32/orchestrator/build.rs` — replaced the leftover `ensure_libraries_sync(...).await?` (illegal: sync returns `Result`, not Future) with the async `ensure_libraries(...).await?` since the surrounding `build` is already async. Functions that became `async fn`: - `symbol_analyzer::demangle_batch` - `script_runtime_tests::resolve_runtime_overlay` - `rp2040::orchestrator::compile_boot2_object` - `sam::orchestrator::install_sam_core` - `sam::orchestrator::install_samd_core` Tests promoted to `#[tokio::test]`: 14 total - script_runtime_tests: 4 - pipeline/library: 7 - compile_many: 3 Verification deferred per task instructions (no cargo check/build/test in this commit — that's a follow-up step). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…u_for_mcu async - crates/fbuild-build/tests/eh_frame_strip_esp32.rs: #[tokio::test] + .await on orchestrator.build (×2) - crates/fbuild-build/tests/lite_scons_acceptance.rs: resolve_lite helper to async, 5 #[test] → #[tokio::test], all callers .await - crates/fbuild-build/tests/nxplpc_core_compile_commands.rs: build_core_repo helper to async, caller .await'd - crates/fbuild-build/tests/esp32_build.rs: incremental_build_at helper to async + caller .await - crates/fbuild-daemon/src/handlers/emulator/qemu_deploy.rs: resolve_esp_qemu_for_mcu → async fn, awaits pkg.resolve_executable - crates/fbuild-daemon/src/handlers/emulator/runners.rs: caller of resolve_esp_qemu_for_mcu .await'd - crates/fbuild-daemon/src/handlers/emulator/tests_process.rs: rewrite the Result::and_then chain (no longer composable since resolve_executable is async) into a sequential let-pkg + .await pattern - crates/fbuild-packages/src/library/library_manager.rs: fully-qualify JoinSet generic as std::result::Result<X, FbuildError> since fbuild_core::Result is the 1-arg type alias shadowing prelude Result Workspace `cargo check --all-targets` is clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedToo many files! This PR contains 161 files, which is 11 over the limit of 150. To get a review, narrow the scope: Upgrade to a paid plan to raise the limit. ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (161)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
) The zccache_hit_across_workspace_rename integration test was written against the wrapper-binary 'zccache wrap ...' path that was deleted in #800. compile_source ignores its _compiler_cache argument and routes every compile through the embedded ZccacheService, so the fake-zccache subprocess the test spawns is never invoked and its log assertions can never pass. Cache-hit-survives-workspace-rename coverage now belongs against the embedded service directly; the zccache_embedded_smoke.rs test covers the basic embedded path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Release the #813 full-async tokio runtime migration. The work landed across PR #823 (the migration itself, admin-merged), #824 (the local Docker Linux verify infra), and #825 (cargo fmt + LOC gate fixes for the migration's wide-ranging edits). This is the first release on the unified async runtime — every per-TU compile, every per-platform deploy, every subprocess invocation now drives through the daemon's single tokio runtime, making tokio-console visibility complete and removing the rayon / std::thread::scope hybrid that the migration replaced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #813. Closes #814 #815 #816 #817 #819 #820 #821 (the
per-crate async-audit sub-issues).
What this does
Whole-app async migration. Every per-TU compile, every per-watch
fingerprint, every download, every flash-tool subprocess spawn now
lives on
fbuild-daemon's tokio runtime. The embeddedZccacheService(landed in v2.3.12 via #800) is no longer the only thing tokio-console
can see — it can now span the entire daemon.
Diff at a glance
Eight commits, ~120 files changed, ~3,500 lines of churn.
Foundation
crates/fbuild-core/src/subprocess.rs—run_commandfamilyrewritten to
async fnontokio::process::Command. Sync_blockingbridges retained for CLI diagnostic subcommands.
PR_SET_PDEATHSIG) preservedvia
tokio::process::Command::creation_flags+ the existingcontainment::tokio_spawn::spawn_containedadapter.crates/fbuild-core/src/response_file.rs—write_response_fileasync;_blockingretained for tests.async-trait = "0.1"added; consumed by every trait below.Trait conversions
BuildOrchestrator::build→ async (#[async_trait]). 13 per-platformorchestrators updated.
Compiler::compile_one/compile_c/compile_cpp/compiledefaults → async. 11 per-platform compilers updated.
Linker::archive/link/convert_firmware/report_size/link_all→ async. 10 per-platform linkers updated.PlatformSupport::install_deps→ async. 13 platforms.Deployer::deploy+post_deploy_recovery→ async. 4 per-platformdeployers (AVR, LPC, Teensy, ESP32) + 2 test impls.
Package::ensure_installed→ async. 32 impls (toolchains + cores +frameworks + libraries).
Hot-path conversions
compile_sourcecallsFbuildZccacheService::compile(...).awaitdirectly — no more
runtime.block_on(...)bridge.parallel::compile_sources_parallelandcompile_many::run_stage2rewritten from
std::thread::scope+Mutex<Iter>totokio::task::JoinSet+tokio::sync::Semaphore.pipeline/*modules all async.library_manager::ensure_librariesparallel download + transitive-dep resolve via
JoinSet.handlers/operations/*drop theirspawn_blockingwrappers; they
.awaitthe now-async orchestrator/deployer directly.Shared async HTTP client
crates/fbuild-packages/src/http.rswith aOnceLock<reqwest::Client>configured with 300s total + 30s connect timeout. Composes with the
timeout audit audit: blocking operations with no timeout in fbuild-packages (sub-issue of #802) #805.
reqwest::blocking::*call site (arduino_api.rs) converted.Serial + ESP32 native bridges (forced-sync wrapped)
serialport::*direct syscalls stay sync, individually wrapped intokio::task::spawn_blocking(the upstreamserialportcrate hasno async equivalent without
tokio_serial; that migration isdeferred — see audit: sync code that could be async in fbuild-serial (sub-issue of #813) #814 follow-up).
espflash(native ESP32 flash) is sync;Esp32Deployerwraps eachentry point in
spawn_blocking.baud_134_trigger,wait_for_disappearance,etc.) individually
spawn_blocking'd.Test infrastructure
crates/*/tests/+ in-file test modules promotedfrom
#[test]→#[tokio::test(flavor = "multi_thread", worker_threads = 4)].MockBuilderincompile_manytests adopts#[async_trait].tokio::sync::Barrierreplacesstd::thread::Barrierin compile-manyintegration tests.
env_lock()helper in emulator tests usestokio::sync::Mutexso theguard can be held across
.await(clippyawait_holding_lock).Local verification (Windows)
soldr cargo check --workspace --all-targetsclean.soldr cargo clippy --workspace --all-targets -- -D warningsclean.soldr cargo test --workspace --lib— 2,116 tests pass, 0 fail.soldr cargo dylint --all -- --workspace— running (will update PRwhen complete).
crates/*/tests/) — running.🤖 Generated with Claude Code